Cognate and Misspelling Features for Natural Language Identification

نویسندگان

  • Garrett Nicolai
  • Bradley Hauer
  • Mohammad Salameh
  • Lei Yao
  • Grzegorz Kondrak
چکیده

We apply Support Vector Machines to differentiate between 11 native languages in the 2013 Native Language Identification Shared Task. We expand a set of common language identification features to include cognate interference and spelling mistakes. Our best results are obtained with a classifier which includes both the cognate and the misspelling features, as well as word unigrams, word bigrams, character bigrams, and syntax production rules.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic cognate identification with gap-weighted string subsequences

In this paper, we describe the problem of cognate identification in NLP. We introduce the idea of gap-weighted subsequences for discriminating cognates from non-cognates. We also propose a scheme to integrate phonetic features into the feature vectors for cognate identification. We show that subsequence based features perform better than state-ofthe-art classifier for the purpose of cognate ide...

متن کامل

Effect of Cognate-Based Instruction Strategy on Vocabulary Learning Among Iranian EFL Learners

Cognates are the words celebrating their similarities from phonetic, orthographic, and semantic points of view across two or more languages. The aim of the present study was to investigate the effect of cognate-based instruction strategy on vocabulary learning among Iranian EFL learners. To achieve the goal of the study, 80 EFL learners (15-27 years old) took part in the study; all of them were...

متن کامل

Effect of Cognate-Based Instruction Strategy on Vocabulary Learning Among Iranian EFL Learners

Cognates are the words celebrating their similarities from phonetic, orthographic, and semantic points of view across two or more languages. The aim of the present study was to investigate the effect of cognate-based instruction strategy on vocabulary learning among Iranian EFL learners. To achieve the goal of the study, 80 EFL learners (15-27 years old) took part in the study; all of them were...

متن کامل

Siamese convolutional networks based on phonetic features for cognate identification

In this paper, we explore the use of convolutional networks (ConvNets) for the purpose of cognate identification. We compare our architecture with binary classifiers based on string similarity measures on different language families. Our experiments show that convolutional networks achieve competitive results across concepts and across language families at the task of cognate identification.

متن کامل

Offline Language-free Writer Identification based on Speeded-up Robust Features

This article proposes offline language-free writer identification based on speeded-up robust features (SURF), goes through training, enrollment, and identification stages. In all stages, an isotropic Box filter is first used to segment the handwritten text image into word regions (WRs). Then, the SURF descriptors (SUDs) of word region and the corresponding scales and orientations (SOs) are extr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013